-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add glob file type support #8006
Add glob file type support #8006
Conversation
We can avoid adding a new deoendy by using globset (which we already depend on) |
I still want to solve #7029. I think if we do go ahead with this then that could be a good opportunity for fixing that aswell. Technically globs can also match file extensions so we could theoretically replace all matching modes with globs for maximum consistency but that's sadly too much of a breaking change. Maybe one day when we migrate to script based config |
I didn't notice we already use globset, thanks for the pointer! I was considering trying to rewrite all the configs to use globbing for the file extension but I think it would add too many collisions. Maybe if we kept a separate map to try extension globs after the normal globs, but not sure how performant that would be. |
Yeah that would be too large of a breaking change. Breaking the suffix config is fine IMO, since it's pretty rarely used but for file extensions it would break a ton of user config. |
languages.toml
Outdated
@@ -2775,7 +2775,7 @@ source = { git = "https://github.com/kylegoetz/tree-sitter-unison", rev = "98c4e | |||
[[language]] | |||
name = "todotxt" | |||
scope = "text.todotxt" | |||
file-types = [{ suffix = ".todo.txt" }, "todotxt"] | |||
file-types = [{ glob = ".todo.txt" }, "todotxt"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is quite the same thing i believe the indention was to allow *.todo.txt
although I never use that format so I am not sure... Which reminds me... I think we need to check glob matches before file extensions to correctly handle cases like this (otherwise this would be detxted as a .txt file so plaintext)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point, I'm also not sure what files this should match; this vim plugin seems to match todo.txt
or *.todo.txt
so maybe we should have a glob for both? Perhaps someone who uses the tool can provide some clarification.
Currently it tries to match exact filenames, then globs, then extensions (so we don't have collisions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jan9103 As you added todotxt, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think?
AFAIK the official syntax by todotxt.org is that any filename matching regex ^(.*\.)?todo\.txt$
should be considered.
Under the assumption glob uses unix glob syntax i have a few notes on [{ glob = "todo.txt" }, { glob = ".todo.txt" }, "todotxt"]
:
- the code looks as if
*/$GLOB_VALUE
(*/.todo.txt
) is matched as glob, but it should be*.todo.txt
todo.txt
should probably be matched, but its either the exact filename (and therefore dosn't need glob) or required a.
and therefore is already matched by{ glob = ".todo.txt" }
..todotxt
is a alternative extension sometimes used and therefore should get the same treatment as".todo.txt"
. but since its not in the official specs it would be fine to leave it out.
therefore this should probably work:
file-types = [{ glob = "{,*.}todo{,.}txt" }]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for filling jn the details. It should be [{ glob = "todo.txt" }, { glob = "*.todo.txt" }, "todotxt", "todo.txt"]
with the syntax from this PR. Plaintext now only matches file extensions with this PR, so you need a glob for the " only filename case". By default globs match relatively paths but if your glib starts with /
or *
it matches an absolute path instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
helix-core/src/syntax.rs
Outdated
.language_config_ids_by_glob | ||
.iter() | ||
.find_map(|(glob, id)| { | ||
if glob.compile_matcher().is_match(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly inefficient, I the very least we should precompile the glob here but even if we do that matching every glob in sequence is fairly slow.
Instead you cab jse a GlobSet. Matching a globset is not much slow than matching a single Glob but matches all globs at once. You can use https://docs.rs/globset/latest/globset/struct.GlobSet.html#method.matches to get a list of all matching patterns. You should then select the longest pattern (if there are multiple matches altough usually there would only be one of then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, sounds much more efficient!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would make more sense to select the last pattern matched instead of longest pattern, since if there are conflicts it's likely because of someone's personal config. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the longest match makes sense as a tiebreaker. For example, if we switch to using globs for filextensions in the future then *.txt
and *.todo.txt
(example from above) would both match the same baths but .todo.txt
should be used because its longer (its matches are more specific). I can imagine something similar for other globs. I can't really think of a case where would would want to use the more general pattern over the more specific one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new implementation that uses GlobSet
; it got a little more complicated since we need to also build the GlobSet, but I think it's ok to handle that in Loader
as we only load it once at startup (or when language config changes).
@gjabell I don't think my Rust is good enough for me to review your code, but thanks for asking :) |
I'm neither good at rust nor familiar with helixes code(-style) and therefore don't want to give a official approval either, but the code looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am gonna take this for a spin soonish but looking trough the source code this lgtm
Hi all, just checking in to see if there's anything I need to do to make this PR ready (waiting on author tag is still added). If not please let me know once you're ready to merge so I can rebase it against the latest changes. |
The label was an accident/just forgot about it. we are currently in a soft feature freeze period leading up to the next release so this won't be merged until the release happens. |
This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look through the languages.toml and I think everything that is a full filename has been converted to a { glob = "<filename>" }
👍
I just have a suggestion for starlark file-types but otherwise this looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great to see this finished! I think moving fully to globs for all ft detection in the future would be great so we don't have to keep different ft variants around (potentially once we switch away from toml config)
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since #8006 full filenames uses glob
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since helix-editor#8006 full filenames uses glob
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since helix-editor#8006 full filenames uses glob
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since helix-editor#8006 full filenames uses glob
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since helix-editor#8006 full filenames uses glob
* Replace FileType::Suffix with FileType::Glob Suffix is rather limited and cannot be used to match files which have semantic meaning based on location + file type (for example, Github Action workflow files). This patch adds support for a Glob FileType to replace Suffix, which encompasses the existing behavior & adds additional file matching functionality. Globs are standard Unix-style path globs, which are matched against the absolute path of the file. If the configured glob for a language is a relative glob (that is, it isn't an absolute path or already starts with a glob pattern), a glob pattern will be prepended to allow matching relative paths from any directory. The order of file type matching is also updated to first match on globs and then on extension. This is necessary as most cases where glob-matching is useful will have already been matched by an extension if glob matching is done last. * Convert file-types suffixes to globs * Use globs for filename matching Trying to match the file-type raw strings against both filename and extension leads to files with the same name as the extension having the incorrect syntax. * Match dockerfiles with suffixes It's common practice to add a suffix to dockerfiles based on their context, e.g. `Dockerfile.dev`, `Dockerfile.prod`, etc. * Make env filetype matching more generic Match on `.env` or any `.env.*` files. * Update docs * Use GlobSet to match all file type globs at once * Update todo.txt glob patterns * Consolidate language Configuration and Loader creation This is a refactor that improves the error handling for creating the `helix_core::syntax::Loader` from the default and user language configuration. * Fix integration tests * Add additional starlark file-type glob --------- Co-authored-by: Michael Davis <[email protected]>
* languages: add docker-compose language it uses docker-compose-langserver as lsp And yaml for syntax highlighting, indents and injections * languages: add luajit as a shebang of lua This helps to provide syntax highlighting and other lua goodies when writing luajit * book(update): run cargo xtask docgen * since helix-editor#8006 full filenames uses glob
One of the issues I've run into with Helix (coming from nvim) is the ability to specify file types based on a full path pattern and not just a suffix. This is necessary for file types like YAML which might need different tooling based on their filesystem context (examples include GitHub Actions and Helm charts) or things like Dockerfile (a common paradigm is to add a suffix for different use cases, e.g.
Dockerfile.dev
).It seems that this issue was already discussed in #2455; not sure if there are plans to introduce a different paradigm for file type matching which solves it without globs but I think it's a rather necessary feature either way. Other options would include:
I'm new to Rust and to Helix, so please let me know where improvements can be made or where I'm missing something 🙂
Fixes #7422?